arXiv:l 504.08352V 1 [cs.CC] 30 Apr 2015 


ETH Hardness for Densest-/c-Subgraph with Perfect Completeness 


Mark Braverman * 


Young Kun Ko ^ Aviad Rubinstein ^ Omri Weinstein 


Abstract 

We show that, assuming the (deterministic) Exponential Time Hypothesis, distinguishing 
between a graph with an induced fc-clique and a graph in which all fc-subgraphs have density 
at most 1 — e, requires time. Our result essentially matches the quasi-polynomial 

algorithms of Feige and Seltser |FS97) and Barman |Barl5b) for this problem, and is the first 
one to rule out an additive PTAS for Densest fc-Subgraph. We further strengthen this result by 
showing that our lower bound continues to hold when, in the soundness case, even subgraphs 
smaller by a near-polynomial factor (k' = k- 2 -^dog")) are assumed to be at most (1 — e)-dense. 

Our reduction is inspired by recent applications of the “birthday repetition” technique 
[AIM 141IBKW15] . Our analysis relies on information theoretical machinery and is similar in 
spirit to analyzing a parallel repetition of two-prover games in which the provers may choose to 
answer some challenges multiple times, while completely ignoring other challenges. 
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1 Introduction 


/c-Clique is one of the most fundamental problems in computer science: given a graph, decide 
whether it has a fully connected induced subgraph on k vertices. Since it was proven NP-complete 
by Karp |Kar72j . extensive research has investigated the complexity of relaxed versions of this 
problem. 

This work focuses on two natural relaxations of fc-CuiQUE which have received significant atten¬ 
tion from both algorithmic and complexity communities: The first one is to relax “fc”, i.e. looking 
for a smaller subgraph: 

Problem 1.1 (Approximate Max Clique, Informal). Given an n-vertex graph G, decide whether G 
contains a clique of size k, or all induced cliques of G are of size at most 5k for some 1 > 5{n) > 0. 

The second natural relaxation is to relax the “Clique” requirement, replacing it with the more 
modest goal of finding a subgraph that is almost a clique: 

Problem 1.2 (Densest fc-Subgraph with perfect completeness. Informal). Given an n-vertex graph 
G containing a clique of size k, find an induced subgraphs of G of size k with (edge) density at least 
(I — s), for some 1 > e > 0. (More modestly, given an n-vertex graph G, decide whether G contains 
a clique of size k, or all induced k-subgraphs of G have density at most (1 —s))- 

Today, after a long line of research jFGL+96l IAS981 lALM+9^ IHas99l IKhoOIl IZuc07] we have 
a solid understanding of the inapproximability of Problem 11.11 In particular, we know that it is 
NP-hard to distinguish between a graph that has a clique of size k, and a graph whose largest 
induced clique is of size at most k' = 5k for 5 = |Zucn7| . The computational complexity of 

the second relaxation (Problem II.2p remained largely open. There are a couple of quasi-polynomial 
algorithms that guarantee finding a (1 — e)-dense k subgraph in every graph containing a A:-clique 
[TWirR^rT^ . suggesting that this problem is not NP-hard. Yet we know neither polynomial¬ 
time algorithms, nor general impossibility results for this problem. 

In this work we provide a strong evidence that the aforementioned quasi-polynomial time algo¬ 
rithms for Problem 11.21 [FS^ IBarlSb] are essentially tight, assuming the (deterministic) Exponen¬ 
tial Time Hypothesis (ETH), which postulates that any deterministic algorithm for 3SAT requires 
2^in) tirne [iPnT] . In fact, we show that under ETH, both parameters of the above relaxations are 
simultaneously hard to approximate: 

Theorem 1.3 (Main Result). There exists a universal constant e > 0 such that, assuming the 
(deterministic) Exponential Time Hypothesis, distinguishing between the following requires time 
^Q(iogn)^ ^ ig number of vertices ofG. 


Completeness G has an induced k-clique; and 


Soundness Every induced subgraph of G size k' 


k ■ 2 


n( 


log n 
log log n 


has density at most 1 — e. 


Our result has implications for two major open problems whose computational complexity 
remained elusive for more than two decades: The (general) Densest A:-Subgraph problem, and 
the Planted Clique problem. 

The Densest /c-Subgraph problem, DkS (r/,e), is the same as (the decision version of) Prob¬ 
lem 11.21 except that in the “completeness” case, G has a fc-subgraph with density t], and in the 

^Barman |Barl5b| approximates the Densest /c-Bi-Subgraph problem. Densest /c-Subgraph can be handled 
via a simple modification | Barl5a |. 
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“soundness” case, every /c-subgraph is of density at most e, where r/ 3> e. Since Problem 11.21 is 
a special case of this problem, our main theorem can also be viewed as a new inapproximability 
result for DkS (1,1 — e). We remark that the aforementioned quasi-polynomial algorithms for the 
“perfect completeness” regime completely break in the sparse regime, and indeed it is believed that 
DkS ( n “,n (for k = n^) in fact requires much more than quasi-polynomial time 
The best to-date algorithm for Densest /c-Subgraph due to Bhaskara et. al, is guaranteed to find 
a fe-subgraph whose density is within an ~ /^-multiplicative factor of the densest subgraph of 
size k [BCV+12| . and thus DkS (ry, e) can be solved efficiently whenever tj 3> • e (this improved 

upon a previous n^/^-approximation of Feige et. al [FKP01| 1. Making further progress on either 
the lower or upper bound frontier of the problem is a major open problem. 

Several inapproximability results for Densest /c-Subgraph were known against specific classes 
of algorithms jBCV~*~l^ or under assumptions that are incomparable or stronger (thus giving weaker 
hardness results) than ETH: NP ^ BPTI ME (2^^^] [KhoOBj . Unique Games with expansion 
|RS10| . and hardness of random fc-CNF |Fei02[ lAAM+ll] . The most closely related result is by 
Khot [KhoOBl . who shows that the Densest /c-Subgraph problem has no PTAS unless SAT is 
in randomized sub exponential time. The result of |Kho06| . as well as other aforementioned works, 
focus on the sub-constant density regime, i.e. they show hardness for distinguishing between a 
graph where every /c-subgraph is sparse, and one where every /c-subgraph is extremely sparse. In 
contrast, our result has perfect completeness and provides the first additive inapproximability for 
Densest /c-Subgraph — the best one can hope for as per the upper bound of [BarlSbj . 

The Planted Clique problem is a special case of our problem, where the inputs come from a 
specific distribution (G (n, p) versus G (n, p) + “a planted clique of size fc”, where p is some constant, 


typically 1/2). The Planted Clique Conjecture 1 [AAK~*~07[IAKS9^ I Jer92[IKuc95[IFKOOlIDGGPIO] 1 
asserts that distinguishing between the aforementioned cases for p = ll2,k = o{y/n) cannot be 
done in polynomial time, and has served as the underlying hardness assumption in a variety of 
recent applications including machine-learning and cryptography (e.g. AAK'*~n7 IBBBMB IBBl^ l 


that inherently use the average-case nature of the problem, as well as in reductions to worst-case 
problems (e.g. [HK1 1 [ lA AM^ IGT J.B1 IBPB+15bj l. 

The main drawback of average-case hardness assumptions is that many average-case instances 
(even those of worst-case-hard problems) are in fact tractable. In recent years, the centrality of 
the planted clique conjecture inspired several works that obtain lower bounds in restricted models 
of computation [FGR'*~13l IMPWl^ IDM15| . Nevertheless, a general lower bound for the average- 
case planted clique problem appears out of reach for existing lower bound techniques. Therefore, 
an important potential application of our result is replacing average-case assumptions such as the 
planted-clique conjecture, in applications that do not inherently rely on the distributional nature 
of the inputs (e.g., when the ultimate goal is to prove a worst-case hardness result). In such 
applications, there is a good chance that planted clique hardness assumptions can be replaced with 
a more “conventional” hardness assumption, such as the ETH, even when the problem has a quasi¬ 
polynomial algorithm. Recently, such a replacement of the planted clique conjecture with ETH was 
obtained for the problem of finding an approximate Nash equilibrium with approximately optimal 
social welfare |BKW15| . 

We also remark that, while showing hardness for Planted Clique from worst-case assumptions 
seems beyond the reach of current techniques, our result can also be seen as circumstantial evidence 
that this problem may indeed be hard. In particular, any polynomial time algorithm (if exists) 
would have to inherently use the (rich and well-understood) structure of G{n,p). 
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Techniques 

Our simple construction is inspired by the “birthday repetition” technique which appeared recently 
in |AIM14l IBKW 15l IBPR15a| : given a 2CSP (e.g. 3COL), we have a vertex for each O (Y/re)-tuple 
of variables and assignments (respectively, 3COL vertices and colorings). We connect two vertices 
by an edge whenever their assignments are consistent and satisfy all 2CSP constraints induced on 
these tuples. In the completeness case, a clique consists of choosing all the vertices that correspond 
to a fixed satisfying assignment. In the soundness case (where the value of the 2CSP is low), 
the “birthday paradox” guarantees that most pairs of vertices vertices (i.e. two Q (-y/n)-tuples of 
variables) will have a significant intersection (nonempty CSP constraints), thus resulting in lower 
densities whenever the 2CSP does not have a satisfying assignment. In the language of two-prover 
games, the intuition here is that the verifier has a “constant chance in catching the players in a lie 
if thy are trying to cheat” in the game while not satisfying the CSP. 

While our construction is simple, analyzing it is intricate. The main challenge is to rule out a 
“cheating” dense subgraph that consists of different assignments to the same variables (inconsistent 
colorings of the same vertices in 3COL). Intuitively, this is similar in spirit to proving a parallel 
repetition theorem where the provers can answer some questions multiple times, and completely 
ignore other questions. Continuing with the parallel repetition metaphor, notice that the challenge 
is doubled: in addition to a cheating prover correlating her answers (the standard obstacle to 
parallel repetition), each prover can now also correlate which questions she chooses to answer. Our 
argument follows by showing that a sufficiently large subgraph must accumulate many non-edges 
(violations of either 2CSP or consistency constraints). To this end we introduce an information 
theoretic argument that carefully counts the entropy of choosing a random vertex in the dense 
subgraph. 

1.1 Open problems 

There are several interesting open problems related to our work. We henceforth list four of them 
that are of particular interest and potential applications. 

Strengthening the inapproximability factor Our result states that it is hard to distinguish 
between a graph containing a fc-clique and a graph that does not contain a very dense (1 — 5) 
/c-subgraph. The latter (1 — 6) seems to be a limitation of our technique. None of the algorithms 
we know (including the two quasi-polynomial time algorithms mentioned above) can distinguish 
in polynomial time between a graph containing a A:-clique and a graph that does not contain even 
a slightly dense (d) /c-subgraph; for any constant <5 > 0, and in fact even for some sub-constant 
values of 6. Furthermore, there is evidence [AAM"*" iT] that this problem may indeed be hard. This 
naturally leads to the following problem. 

Problem 1.4 (Hardness Amplification). Show that for every given constant 5 > 0, distinguishing 
between the following two cases is ETH-hard: 

• There exists S C V of size k such that den(S') = 1. 

• All S <ZV of size k have den(S') < 6. 

We remark that a similar amplification, from “clique versus dense” (den(S') = 1 vs. den(S') = 
1 — J) to “clique versus sparse” (den(S') = 1 vs. den(S') = J), was shown by Alon et al. when the 
“clique vs. dense” instance is drawn at random according to the planted clique model [AAM"*" iT . 
(Unfortunately, their techniques do not seem to apply to our hard instance.) 
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An easier variant of Probleni ll.4l is to show hardness for a large gap in the imperfect completeness 
regime. 

Problem 1.5 (Hardness Amplification - imperfect completeness). Show that there exist parameters 
Q < £ <^ rj < 1 for which distinguishing between the following two cases is ETH-hard: 

• There exists S C V of size k such that den(S') > r], 

• All S C V of size k have den(S') < e. 

We note that such gaps can be obtained from average-case hardness for a random /c-CNF [AAM^ iT 
and from Unique Games with expansion |RS10] . 


Beyond quasi-polynomial hardness Another interesting challenge is to trade the perfect com¬ 
pleteness in our main result for stronger notions of hardness. Indeed, there are substantial evidences 
which suggest that the “sparse vs. very-sparse” regime (DkS( 77 ,e)) is much harder to solve. The 
gap instance in BCV'*~12 where all known linear and semidefinite programming techniques fail is 


a very sparse instance and has integrality gap of In particular, every vertex has degree 

ni/2+°(i), compared to almost linear average degree in our instance. Since no other algorithms suc¬ 
ceed in this regime (even in quasi-polynomial time), it is natural to look for stronger lower bounds 
on the running time. 


Problem 1.6 (Trading-off perfect completeness for stronger lower bounds). Show that there exist 
parameters 0 < e < r/ <C 1 for which distinguishing between the following two cases is lAP-hard: 

• There exists S C V of size k such that den(S') > r], 

• All S C V of size k have den(S') < e. 


Finding Stable Communities The problem of finding Stable Communities is tightly related 
to Densest ^-Subgraph, and has received recent attention in the context of social networks and 
learning theory |AGSS12l IAGM131IBL13] . 

Definition 1.7 (Stable Communities |BBB+13j ). Let a, (5 with ft < a < 1 be two positive 
parameters. Given an undirected graph, G = {y,E), S CV is an {a, (3)-cluster if S is : 

1. Internally Dense: Vi G S, |AA(i) n 5| > Q!|5'|. 

2. Externally Sparse: Vi ^ S, |W(i) n S'] < /3|5'|. 

Currently, only planted clique based hardness is known. 

Theorem 1.8 ( [BBB'*~13] ). Eor sufficiently small (constant) 7 , finding a (1,1 — 7 ) cluster is at 
least as hard as Planted Clique. 

As insinuated in the introduction, we believe it is plausible and interesting to see whether the 
hardness assumption of the theorem above can be replaced with ETH. 

Problem 1.9 (Hardness of Stable Communities). Show that for some a, (3 with ft < a < 1, 
finding an {a, (3)-cluster S is ETH-hard. 
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2 Preliminaries 


Throughout the paper we use den(S') E [0,1] to denote the density of subgraph S, 


den(5):= 


\{SxS)nE 

|5 X S\ 


2.1 Information theory 

In this section, we introduce information-theoretic quantities used in this paper. For a more thor¬ 
ough introduction, the reader should refer to |CT12] . Unless stated otherwise, all log’s in this paper 
are base-2. 

Definition 2.1. Let ^ be a probability distribution on sample space U. The Shannon entropy (or 
just entropy^ of fj,, denoted by H{p), is defined as H{p,) := /r(a;) log 

Definition 2.2 (Binary Entropy Function). For p E [0,1], the binary entropy function is defined 
as follows (with a slight abuse of notation) H{p) := —p\ogp — {1 — p) log(l — p). 

Fact 2.3 (Concavity of Binary Entropy). Let p, be a distribution on [0,1], and let p ~ /r. Then 
H{E, b])>IE^ [H{p)]. 

For a random variable A we shall write H{A) to denote the entropy of the induced distribution 
on the support of A. We use the same abuse of notation for other information-theoretic quantities 
appearing later in this section. 

Definition 2.4. The Conditional entropy of a random variable A conditioned on B is defined as 

H{A\B)=E!,{H{A\B = b)). 

Fact 2.5 (Chain Rule). H{AB) = H{A) + H{B\A). 

Fact 2.6 (Conditioning Decreases Entropy). H{A\B) > H{A\BC). 

Another measure we will use (briefly) in our proof is that of Mutual Information, which infor¬ 
mally captures the correlation between two random variables. 

Definition 2.7 (Conditional Mutual Information). The mutual information between two random 
variable A and B, denoted by I{A-,B) is defined as 

I{A-, B) := H{A) - H{A\B) = H{B) - H{B\A). 

The conditional mutual information between A and B given C, denoted by I{A‘, B\C), is defined as 
I{A- B\C) := H{A\C) - H{A\BC) = H{B\C) - H{B\AC). 

The following is a well-known fact on mutual information. 

Fact 2.8 (Data processing inequality). Suppose we have the following Markov Chain: 

X ^ Z 

where XXZ\Y. Then I{X-,Y) > I{X;Z) or equivalently, H{X\Y) < H{X\Z). 
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Mutual Information is related to the following distance measure. 


Definition 2.9 (Kullback-Leiber Divergence). Given two probability distributions fii and /i 2 on the 
same sample space D such that (Vw € Q){fj, 2 {uj) = 0 ^ = 0), the Kullback-Leibler Divergence 

between is defined as (also known as relative entropy) 


Dkl 


Ml 


M2 = 


^ Ml log 


Mi(^) 

M2(w)' 


The connection between the mutual information and the Kullback-Leibler divergence is provided 
by the following fact. 

Fact 2.10. For random variables A,B, and C we have 


/(^;S|C) =IEb,, 


D 


KL 




A, 


2.2 2CSP and the PCP Theorem 

In the 2CSP problem, we are given a graph G = (V,F) on \V\ = n vertices, where each of the 
edges (u, v) G E is associated with some constraint function V’n,?; : A x ^4 —>■ {0,1} which specifies 
a set of legal “colorings” of u and v, from some finite alphabet A (2 in the term “2CSP ” stands 
for the “arity” of each constraint, which always involves two variables). Let us denote by the 
entire 2CSP instance, and define by OPT('!/;) the maximum fraction of satisfied constraints in the 
associated graph G, over all possible assignments (colorings) of V. 

The starting point of our reduction is the following version of the PCP theorem, which asserts 
that it is NP-hard to distinguish a 2CSP instance whose value is 1, and one whose value is 1 — rj, 
where rj is some small constant: 

Theorem 2.11 (PCP Theorem |Din07| L Given a 3SAT instance ip of size n, there is a polyno¬ 
mial time reduction that produces a 2CSP instance with size \ fi\ = n ■ polylogn variables and 
constraints, and constant alphabet size such that 

• (Gompleteness) IfOFT{ip) = 1 then OPT('(/i) = 1. 

• (Soundness) If OPT((/?) < 1 then OPT('i/!) < 1 — rj, for some constant tj = D(l) 

• (Balance) Every vertex in tp has degree d for some constant d. 

In the appendix, we describe in detail how to derive this formulation of the PCP Theorem from 
that of e.g. |AIM14| . 

Notice that since the size of the reduction is near linear, ETH implies that solving the above 
problem requires near exponential time. 

Corollary 2.12. Let fj be as in Theorem \2.11[ Then assuming ETH, distinguishing between 
OPT(?/;) = 1 and OPT(^/)) < 1 — rj requires time 


3 Main Proof 

3.1 Construction 

Let Ip be the 2CSP instance produced by the reduction in Theorem 12.111 i.e. a constraint graph 
over n variables with alphabet A of constant size. We construct the following graph G.^ = (y,E): 
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• Let p := ^/n\og\ogn and k := (”). 

• Vertices of Gp correspond to all possible assignments (colorings) to all p-tuples of variables 

in iIj, i.e V = [n]^ x \A\p. Each vertex is of the form v = {yx^, • • •, Vxp) where {rci,..., Xp} 

are the chosen variables of v, and yx^ is the corresponding assignment to variable Xj. 

• If X G V violates any 2CSP constraints, i.e. if there is a constraint on {xi,Xj) in V’ which is 
not satisfied by (y^;. ,yxj), then v is an isolated vertex in Gp. 

• Let u = {yx^,yx 2 ,---,yxp) andv = {y'^,^,y'^,^,...,y'^,). {u,v) e E iS: 

— {u,v) does not violate any consistency constraints: for every shared variable Xi, the 
corresponding assignments agree, y^^ = y^.; 

— and {u, v) also does not violate any 2CSP constraints: for every 2CSP constraint on 
(^Xi,x'j^ (if exists), the assignment ^yxi,y^/J satisfy the constraint. 

Notice that the size of our reduction (number of vertices of Gp) is N = (p • \A\p = ^ 

Completeness If OPT('!/)) = 1, then Gp has a /c-clique: Fix a satisfying assignment for 'll;, and 
let S be the set of all vertices that are consistent with this assignment. Notice that IS"! = (p = k. 
Furthermore its vertices do not violate any consistency constraints (since they agree with a single 
assignment), or 2CSP constraints (since we started from a satisfying assignment). 

4 Soundness 

Suppose that OPT(V') < 1 — y, and let eo > 0 be some constant to be determined later. We shall 
show that for any subset S of size k' = k- den(5) <1 — 5, where 5 is some constant 

depending on y. The remainder of this section is devoted to proving the following theorem: 

Theorem 4.1 (Soundness). If OPT(V') < 1 — y, then \/S C V of size k' = k ■ 
den(5) <1 — 5 for some constant 5. 

4.1 Setting up the entropy argument 

Fix some subset S of size k', and let v S' be a uniformly chosen vertex in S (recall that v 
is a vector of p coordinates, corresponding to labels for a subset of p chosen variables). Let Xi 
denote the indicator variable associated with v such that Xj = 1 if the f’th variable appears in v 
and 0 otherwise. We let V represent the coloring assignment (label) for the i’th variable whenever 
Xi = 1, which is of the form I £ A. Throughout the proof, let 

IT,_i =X<i,F<, 

denote the Lth prefix corresponding to v. We can write : 

l!(V|IT,_i,W) = Pv[Xi = 0] • H{Yi\Wi.uXi = 0) +Pr[W = 1] • F(V|ITi_i,W = 1) 

= Pr[X, = l]-H{Yi\Wi.i,Xi = l) 

since H{Yi\Wi-i,Xi = 0) = 0. Notice that since (XY) and v determine each other, and v was 
uniform on a set of size \S\ = k', we have 
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Observation 4.2. H{XY) = log A:'. 

Thus, in total, the choice of challenge and the choice of assignments should contribute log k' to 
the entropy of n. If much of the entropy comes from the assignment distribution (conditioned on the 
fixed challenge variables), we will show that S must have many consistency violations, implying that 
S is sparse. If, on the other hand, almost all the entropy comes from the challenge distribution, we 
will show that this implies many CSP constraint violations (implied by the soundness assumption). 
From now on, we denote 

ai:=H{X,\X<i,Y<i) and I5i := H{Yi\X<i,Y<i). 

When conditioning on the i’th prefix, we shall write ai{wi-i) := H{Xi\X^i,Y^i = rcj-i), and 
similarly for /3i(-). Also for brevity, we denote 

qi := Pr[Aj = 1] and qi{wi-i) := Pr[Aj = l|rci_i]. 


Prefix graphs 

The consistency constraints induce, for each i, a graph over the prefixes; the vertices are the 
prefixes, and two prefixes are connected by an edge if their labels are consistent. (We can ignore 
the 2CSP constraints for now — the prefix graph will be used only in the analysis of the consistency 
constraints.) Formally, 

Definition 4.3 (Prefix graph). For i G [n + 1] let the f-th prefix graph, Gi be defined over the 
prefixes of length i — 1 as follows. We say that Wi-i is a neighbor of cjj_i if they do not violate any 
consistency constraints. Namely, for all j < i, if Xj = 1 for both Wi-i and then Wi and cr* 

assign the same label Yj. 

In particular, we will heavily use the following notation: let M{wi-i) be the prefix neighborhood 
of Wi-i; i.e. it is the set of all prefixes (of length i — I) that are consistent with Wi-i. For technical 
issues of normalization, we let Wi-i G M{wi-i), i.e. all the prefixes have self-loops. 

Notice that Gn+i is defined over the vertices of S (the original subgraph). The set of edges on 
S is contained in the set of edges of Gn+i, since in the latter we only remove pairs that violated 
consistency constraints (recall that we ignore the 2CSP constraints). 

Unless stated otherwise, we always think of prefixes as weighted by their probabilities. Naturally, 
we also define the weighted degree and weighted edge density of the prefix graph. 

Definition 4.4 (Prefix degree and density). The prefix degree of Wi-i is given by: 

deg(rci_i) = ^ Pr[cjj_i]. 

Similarly, we define the prefix density of Gi as: 

den(Gi) = E E Pr[r(;j_i] • Pr[cjj_i]. 

Wi-I 

When it is clear from the context, we henceforth drop the prefix qualification, and simply refer 
to the neighborhood or degree, etc., of Wi-i. 

Notice that in Gn+i, the probabilities are uniformly distributed. In particular, den(Gn+i) > 
den(5), since, as we mentioned earlier, the set of edges in S is contained in that of Gn+i- Finally, 
observe also that because we accumulate violations, the density of the prefix graphs is monotonically 
non-increasing with i. 

Observation 4.5. 

den(Gi) > • • • > den(Gn+i) > den(5). 



Useful approximations 

We use the following bounds on Oj and Pi many times throughout the proof: 

Fact 4.6. 


Oj = E [H{qi{wi-i))] < H{E = H{qi) 


Fact 4.7. 


Pi=E [^i(rci_i))] < E [qi{wi-i) ■ log |^|] = qi log \A\ 


Proof. The bound on Oj follows from concavity of entropy (FactESj)- For the second bound, observe 
that Pi is maximized by spreading qi mass uniformly over alphabet A. □ 

We also recall some elementary approximations to logarithms and entropies that will be useful 
in the analysis. The proofs are deferred to the appendix. 

Fact 4.8. For k = then, 

log A: = nH ^ ± O (logn) = plogn 

More useful to us will be the following bounds on logfch 
Fact 4.9. Let £i > bsQ, and k,k',V,n, p as specified in the construction. Then, 

log k' > max I log k, nH 


P-W- 

n 


j j — £i log k/ logn . 


, Ef 


In particular, this means that most indices i should contribute roughly (^) entropy to the 
choice ofv. 

We will also need the following bound which relates the entropies of a very biased coin and a 
slightly less biased one: 

Fact 4.10. Let 1/n <C |n| <C 1, then 


H 


=^(-)--log--(loge)|^ + 0(n-2)+o(^) 
\ n J \n J n n In 


4.2 Consistency violations 

In this section, we show that if the entropy contribution of the assignments H{Yi\X<i, F<j)) is 
large, there are many consistency violations between vertices, which lead to constant density loss. 
First, we show that if Y<j) > 5ei log A:/log n, then at least a constant fraction of such 

entropy is concentrated on “good” variables. 

Definition 4.11 (Good Variables). We say that an index i is good if 

• Oii> H{qi) - 2qi\og\A\ 

• Pi > 

where £i is a constant to be determined later in the proof. 
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Claim 4.12. For any constant si, > 5ei logfc/logn, 

qt > /{n\og^ 1^1) = Fl{p^/n). 

good i’s 

Proof. We want to show that many of the indices i have both a large and a large /3j simultane¬ 
ously. We can write 

(a* -|- Pi) = Y^ + Pi) + Y1 

ie[n] i: ai+/3i<H{qi)-qilog\A\ i: ai+/3i>H{qi)-qilog\A\ 

Using Facts 14.61 and 14.71 we have 

+ E {HiqP-PP+ Y iH{qP+PP. (1) 

ie[n] i: ai+yi<H{qi)-qilog\A\ i: ai+/Si>H{qi)-qi\og\A\ 

Because the subgraph is of size k', from the expansion of logfc' (Fact [4^ . 

Y + Pi) > nH - £i log k/logn > Y-^ (9i) - ei log k/ log n, 

*e[n] 

where the second inequality follows from the concavity of entropy. Plugging into ([T]), we have 
E ft a E Pi - £i log k/ log n 

i: ai+Pi>H{qi)-qi\og\A\ i: ai+l3i<H{qi)-qi\og\A\ 



E ft] 

i: ai+lii>H{qi)-qi\og\A\ j 


— £i log k / log n 


Rearranging, we get 


E 


A > X ^ A - £1 log k/ log n 


( 2 ) 


i-. ai+Pi>H{qi)-qi\og\A\ 


For all the Vs in the LHS summation, Ui > H (qp — 2qi log |j 4| by Fact 14.71 From now on, we 
will consider only i’s that satisfy this condition. Now, using the premise on Pi and ([2]) we have: 


Y A > (5/2 - l)ei log/c/logn > 0.7ei/9, 

i: ai>H{qi)-2qi log |A| 

where the second inequality follows from our approximation for log k (Fact 14.81) . 

We want to further restrict our attention to Vs for which Pi is at least \eiqi (aka good Vs). 
Note that the above inequality can be decomposed to 

X] + Y^ 

good Vs i- oii>H{qi)-2qi\og\A\ 

0i<^£iqi 

Now via a simple sum bound, 

o 1 1 

2^ Pi< 2^1 Z^(li= 

i-. ai>H{qi)-2qi\og\A\ i 
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Rearranging, we get 


By Cauchy-Schwartz we have: 


Finally, since Pi < qi\og\A\, 


good i’s 

^ /3- > Qeip) /n 

good i’s 


good i’s 


□ 

In the same spirit, we now define a notion of a “good” prefix. Intuitively, conditioning on a good 
prefix leaves a significant amount of entropy on the i’th index. We also require that a good prefix 
has a high prefix degree; that is, it has many neighbors it could potentially lose when revealing the 
i-th label. 

Definition 4.13 (Good Prefixes). We say wi-i is a good prefix if 

• i is good; 

• Pi{wi-i) > e2.qi{wi-i) 

where S 3 = (£4 + «:)log|A|, with 64 an an arbitrarily small constant that denotes the fraction of 
assignments that disagree with the majority of the assignments, k = 0(1/log |j4|) factor, and 62 a 

constant that satisfies 6 = ^ |^p %2 ) ? den(5) = 1 — 5. 

In the following claim, we show that these prefixes contribute some constant fraction of entropy, 
assuming that our subset is dense. 

Claim 4.14. //den(S') >1 — 5, where 5 = and si > 4 e 2 log|A| + 863 , then for every 

good index i, it holds that 

^ Pi[wi_i]Pi {wi_i) > Pi/A 
good Wi-i ’s 

Proof. We begin by proving that most prefixes satisfy the degree condition of Definition 14.131 Let 
Wi-i be popular if z is a good variable and its degree in the prefix graph Gi is at least deg{wi-i) := 
X^cri_ieAr(«;i_i) ^ 1 “ Recall that den(Gi) > den(S') > (1 — 5) (by Observation I4.5|l . 

Thus by Markov inequality, at most \/5-fraction of the prefixes are unpopular. 

Let Zp) be the indicator variable for Wi-i being popular. For the sake of contradiction, suppose 
that more than e 2 -fraction of the gj-mass is concentrated on unpopular prefixes, that is: 

^ PrK_i]ft(u;i_i)=Pr[Z(Wi_i) = 0]-Pr[W = l|^(VFi_i)=0]>e2ft. (3) 

unpopular Wi-fis 
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We would like to argue that this condition implies that the distribution on the W’s is highly biased 
by the conditioning on the (popularity of the) prefix; this in turn implies that a*, the expected 
conditional entropy of W, must be low, contradicting the assumption that i is good. Indeed, by 
data-processing inequality (Fact I2.8[l , 


= H{Xi\Wi.i) 

< H{Xi\Z{Wi.i)) 

= H(Xi)-I{Xi-Z{Wi.i)) (4) 


Since we can write mutual information as expected KL-divergence (Fact [2TT0]) . and KL-divergence 
is non-negative, we get 


I{Xi-Z{Wi.i))=¥.^, [Dkl (z(IFi_i)|xi||z(IW_i)) 

> Qi ■ Dkl (Pr[Z(IW_i) = 1 \xi = l]||z(W_i) = l) 


> Qi ■ Dkl (1 — £2 


1 - V(5) = ^jDkl [£2 


■/s). 


where the second inequality follows from the fact that for all good i’s, our degree assumption implies 
Pi[Z(Wi-i)] > (1 — VS), and our assumption in ([3l) implies, via Bayes rule, that Pr[Z(Wi-i = 
0 I Xj = 1] > £ 2 , and therefore Pr[VFi_i = 1 | Xj = 1] < 1 — £ 2 - Note that by our setting of 
parameters 1 — \/6 > 1 — £ 2 - 
Plugging into ([!]) we have: 


ai< H (qi) - 




(5) 


On the other hand, recall that since i is good, ai > H (qi) — 2qi\og\A\. Recall also that 6 = 

^ I^p/e 2 ) > therefore Dkl > 2 log l^l. Thus, we get a contradiction to ([3j). From now 

on we assume 

^ Pv[wi_i]qi (wi_i) < £ 2 %- ( 6 ) 

unpopular rcj_i’s 

This implies that even if the assignment is uniform over the alphabet, the contribution to 
from unpopular prefixes is small: 

Pr[wi-i]l3i (Wi-i) < Pr[u;i_i]gj (u;i_i)log|A| 

unpopular Wi-i's unpopular tCj-i’s 

< £ 2 ^* log 1^1 < ^£iqi < ^/3i 

where first inequality follows from Fact 14.71 second from ([6]), third from our setting of £1 > 
4£2 log |A|, and fourth from (3i > ^£iqi since i is good. Therefore, 

yy Pr[wi-i]l3i (Wi-i) =/3i - PT[wi-i]Pi{wi-i) > l3i/2 

popular Wi-is unpopular tCj_i’s 

Using a similar argument, we show that for any popular wt-i, most of the qi mass is concentrated 
on its neighbors. Consider any popular Wi-i, and let Af^ (wi-i) denote the complement of Af (wi-i). 
Then we can rewrite a* as: 

ai = y^ PT[ai-i]ai((Ji-i) + y^ Pr[cri_i]aj (cTi_i) 
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Notice that since Wi-i is popular, {wi-i) has measure at most '/5. Thus, if an e 2 -fraction of 
the qi mass is concentrated on (tCj-i), we once again (like in ([5])) have 


oti<H {qi) - qiD^L 




which would again yield a contradiction to i being a good variable. Therefore every popular prefix 
also satisfies the (?i-weighted condition on the degree: 

Pr[o-j_i]gj (cJi-i) > (1 - 62 ) qi (7) 

Recall that a prefix Wi-i is good if it also satisfies /3i (rci-i) > s^-qt (wi-i). Fortunately, prefixes 
that violate this condition (i.e. those with small /3j (rcj-i)), cannot account for much of the weight 
on I3i-. 

PT:[wi-i]/3i {wi-i) < ssqi. 

Since i is good and ei > Ses, this implies: 

y^ PT[wi-i]l3i (wi-i) > /3i/2 - e^qi > (ii/A 

good Wi-is 


since ^ ^ 

em < -eiqi < -A 

where last inequality follows from i being good. □ 

Corollary 4.15. For every good index i, 

good Wi-i ’s 


Proof. 


y^ Pr[r(;i_i]gi (tcj-i) > Pi[wi-i]j3i/\og\A\ 

good Will’s good tCj-i’s 

> l3i/{A\og\A\) 

~ 8 log 1^1^* 


(Fact 14.71) 

(Claim 14.141) 

(Definition of good i) 

□ 


With Claim 14.121 and Corollary 14.151 we are ready to prove the main lemma of this section: 

Lemma 4.16 (Labeling Entropy Bound). If Y^^H{Yi\X<i,Y^i) > then den(S') < 1 — <5. 

Proof. Assume for a contradiction that den(S') > 1—5. For prefix Wi-i, let Vwi^i denote the induced 
distribution on labels to the Tth variable, conditioned on Wi-i and Xi = 1. (If qi{wi-i) = 0, take 
an arbitrary distribution.) After revealing each variable i, the loss in prefix density is given by the 
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probability of “fresh violations”: the sum over all prefix edges (wj-i,(Ti_i) of the probability that 
they assign different labels to the i-th variable: 

den(Gi) - den(Gi+i) = E E ( PT:[wi-i]Pj:[ai-i]qi{wi-i)qi{ai-i)] Pr [1^ / y/] 

( 8 ) 

We now lower-bound PTT>^._^xVa^_^\Yi 7 ^ E/] for good wi-i (notice that we assume nothing 
about A simple calculation shows that for k, < 1/2, if 

A(w'i-i) > (^log |A| - KlogK - (1 - k) log(l - k)) qi{wi-i), 

then the probability mass (under T){wi-i)) on the most common label is at most 1 — k. Observe 
that this probability is an upper bound on Prx>™._^ [Ej = E/]. For good Wi-i, we indeed have 


I3i{wi_i) > £^qi{wi_i) > (e 4 log|A| -£4 log £4 - (1 -e 4 )log(l - e^)) qi{wi_i), 

where the second inequality follows from choice of 64 . Therefore Prx)^, ^ ^ ^ 4 - 

We now have, for every good index z, 

den(Gi) - den(Gi+i) > E E ( Pj:[wi-i]Pj:[ai-i]qi{wi-i)qi{ai_ifj£4 (Eq. ([ 8 ])) 

good ZUj-i’so'i-lSyCwJi-i) 

> e 4 ^i(l - £2) E Pr[r(;i_i]( 7 j(t(;i_i) (Definition of good prefix) 
good zuj-i’s 

ICorollarv 14.151 -I- £2 < 0.2) 


> 


10 log I 

Finally, summing over all good z’s, we get a negative density for S, which is, of course, a 
contradiction. 


1 — den(5) > den(Gi) — den(G„+i) 

= ^ den(Gj) - den(Gi+i) 
i 

> ^ den(Gi) - den(Gi+i) 

good z’s 


> 


> 


E f 

good z’s 


6464 




£^£4 


250 log'^ |A| 


\ 10 log |A| 

(? In = Pl{p^ jn). 


(Observation 14.51) 
(telescoping sum) 


(Claim 14.1211 


□ 


4.3 2CSP violation 

Intuitively, if FI(Aj|X<j, y<j) is large, then the subgraph approximately corresponds to assign¬ 
ments to all subsets in (f”^). More specifically, in this section we show that most of the constraints 
appear approximately as frequently as we expect. Since in any assignment, a constant fraction 
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of them must be violated, this implies (eventually) that a constant fraction of the edges have a 
violated constraint. 

First, we show that most of the variables appear approximately as frequently as we expect by 
showing that most of them are “typical.” 

Definition 4.17 (Typical variables). Prefix Wi-i is typical if 


(1 - £ 5 ) • p/n < Pr[Xi = l\wi-i] < (1 + £ 5 ) • p/n, 

where £5 is some constant such that £5 > 14£i. 

Similarly, we say that variable Xi is typical if 

PrK-i] > 1 - £5 

typical Wi-i’s 


Claim 4.18. If Y^^H{Xi\X^i,Y^i) > ) logfe = logk — 0(p), then all but at most e^n 

variables are typical. 


Proof. Assume by contradiction that there are e^n atypical variables. That is e^nl‘1 variables Xi 
appear with probability at least (1 + £5) - p/n (or at most (1 — £5) - p/n) for an (£ 5 / 2 )-fraction of the 
prefixes Wi-i. Now, subject only to this constraint and maintaining the correct expected number of 
variables in each vertex, the entropy is maximized by spreading the (£ 5 / 4 )-loss in frequency evenly 
across all other prefixes and variables. That is on the atypical prefixes, labels are assigned with 
probability (1 + £5) p/n, and with probability ^1 — Thus, 


y] H{Xi\X^i, y<i) + £5) p/n) + h- ^jni7Ml-- 


^i/4 

-ei/4 


p/n 


Recall from Fact 14.101 the expansion of the entropy function: 

'1 + v 


H 

Therefore, 

Y,H{Xi\X<i,Y. 


n 


= ( - ) - -log-- 

n J n n 


‘“S'^£i + 0{n-=)+o(^ 

n \ n 


<i) 


< -fin 
4 


n 


P 


P 


- -£5-log^- 


n 


+ 1 1 - j I n 


H 


n 


+ 


loge^ p o 

— • fcc 


n 




+ 0 {^eI 

n 


^1/4 

1 — £|/4 J n 


PlogP+o((P) ) +0(Pel) 

n n \\n/ J \n J 


= n 


H{^\- 
n. 


loge\ p 4 


— ■ et + O \ ( — 


n 


n 


+ 0(-£i 

n 


Recall that —21og ^ < logti. Thus for (^^) £5 > 14£i, we have that 


n 


((-] ) - O (-ei) > - ■ 12£i > --log - • 24£i/logn > (12£i/logn) (-) 
\\n/ J \n / n n n \n/ 


and therefore, 


y^i4(Aj|A<j, Y<i) < (1 - 12£i/logn)ni4 ^0 < (1 - 6£i/logn) log A:, 
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where the second inequality follows from Fact 14.81 Thus we have reached a contradiction. Notice 
that the ^ • £5 term of missing entropy is symmetric (but not the negligible higher order 

terms); i.e. the same derivation can be used to show a contradiction when many variables appear 
with probability less than (1 — 65) pjn. □ 

Definition 4.19. Let X{u,v) be defined as the number of{i,j) pairs such that 

• In the original 2CSP instance there exists an edge (constraint) between typical variables 
Xi and Xj . 

• Xi = 1 for u and Xj = 1 for v. 

• Ui-i and Vj-i are typical prefixes, where Ui-i denotes the prefix represented by u for X^i, Y^i, 
similarly for Vj-i. 

Intuitively, X(u, v) is the number of “tests” of 2CSP-constraints between vertices u, v, when 
restricting to typical prefixes and variables. We now use the properties of typical prefixes and 
constraints to show that X{u,v) behaves “nicely”. 

Claim 4.20. u)] > {\ — er)p^/n and 'Ku,v\X‘^ {u,v)\ < (1 + 267 ) [X (u, u)])^, 

where ej is some constant Sj > Qe^ + ©(el). 

Proof. For any i,j G [n], we say that i G if there is a constraint on (xi,Xj). For the 

proof of this claim, we also abuse notation and denote i G u when i is typical, Uj_i is a typical 
prefix, and Xi = 1 for v. We also say that i ^ Af (u) if i is a typical variable, i G (j)^ and 

j & u (for some j G [n]). (Do not confuse this notation with prefix neighborhood in the prefix 
graph.) We can now lower bound the expectation of X {u, v) as: 


[X {u, u)] > E„ 




Notice that this bound may not be tight since any i £ v can potentially have d neighbors in u. 
Thus our upper bound is: 


E„,.„ [X (n, v)] <d- E„ 


^ Pr[zGu] 

i&N'lu) 


By definition of typical variables, for each typical i, i £ v with probability at least (1 — 65 ) p/n] 


thus, 


E„,.„ [X {u, u)] > El 


p/n 


i&N(u) 


= (1 - £ 5 )^ p/n • E„ [\Af (u) 


(9) 


All but variables are typical, so all but 2 e 5 n variables are typical and have at least one typical 
neighbor. We restrict our attention to the set of such variables and fix one typical neighbor for 
each; this neighbor appears in u with probability at least (1 — efif' pjn. Therefore, 


E„ [|W(n)|] > (1 - 2 e 5 )n • ((1 - e^fplri) > (1 - 4 e 5 )p ( 10 ) 

Combining ([9]) and (fTOjl . we get the desired bound: 

E„,i, [X {u,v)] > (^(1 - 65 ^ p/n^ (1 - 4 e 5 )p > (1 - e'j)p^ jn. (11) 
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Similarly, for the variance we have 


[Z^ {u, n)] < d? ■ Eu,v { ^ 1 

\i&vnj\f{u) 


= (f ■ E„ 


< (f ■ E„ 


E '+ E 1 

ij^j£vnN'{u) i€vnM{u) 


2 Pr [i G n] Pr [j G n I z E 

^ ^ V V 

i<j£jV{u) 


+ (f ■ E„,^ [I {u, n)] . 


Since for every prefix, each variable receives a typical assignment with probability at most (1 + £ 5 ) ■ 
p/n, we have that 


E„,^ (n, n)] < 2 (f ■ E^ 




n) 


+ cf ■ E„,^ [1 (n, n)] 


< ((1 + £ 5 ) • p/nf • 2d^ • E„ ( • E„,, [I {u, n)] 


We would like to bound E„ (’^ 2 ^^) ■ 


E,, 


^ ^ ^ Pr[A:Gu]Pr[/Gu| fcGu] 

' i<j keAf^^^P{i) 

E y^ Pr [A: G u] Pr h G tt I A: G u] 

U U 

i<j keJV^c:^Pip 
and k<i 

+E E Pr [/ G n] Pr [A: G u I ^ G u] 

*<i 

;g_y^2CSP(j) 

and k>i 

+ E E Pr[/^Gn] 

*<i fceA^2csp(j)n_yv2csp(j) 

For the first two summands, we can use the condition on the prefixes to conclude that 


(HI) + (HU < {^2)^ ((1 + ^5) • p/nY 

Whereas to bound the third summand we first change the order of summation: 


( 12 ) 


(13) 


(14) 


(15) 


(fT^ = Pr [A: G u] • I { (i, j) '■ i Y 3 k G TV^^CSP p j^ 2 CSP (j) 11 

k 

< ((1 + 65 ) • p) = O {p) 

Summing the last two inequalities, we have 

2 • E„ ((1 + £ 5 ) -pf + O ip) < (1 + £ 5 )' d^P^ 
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Plugging back into (fT^ : 

(n, n)] < {l + £^f (fp^/n^+ (f ■^u,v\^{u,v)] 

Using m and the fact that p = ^/n log log n S> ^/n, this gives 

E,,, [X2 (u, n)] < + [I (n, n)])' + (f • E„,, [I (n, n)] 

< (1 + 257) (E„,^ [I {u, n)])^ 

□ 


It will also be convenient to count the number of tests between a pair of variables. 

Definition 4.21. For any pair of typical (i,j) G letl~^ {i,j) be defined as the number of {u,v) G 

(5 X S) pairs such that 


• Xi = 1 for u and Xj = 1 for v. 

• Ui-i and Vj-i are typical prefixes, where Ui-i denotes the prefix represented by u for X^i, Y<^i, 
similarly for Vj-i. 

We now have two ways to count the total number of tests between typical prefixes to typical 
variables: 


Observation 4.22. E(„,„)e(Sx 5 ) j)- 

Furthermore, since i and j are typical, the number of tests between also behaves “nicely”: 
Observation 4.23. For every typieal {i, j) we haveZ^{i,j) G |5pp^/n^ (1 — £ 5 )^ , (1 + £ 5 )^ 
Proof. 


typical Ui-fis 

G [ (1 - 65 )^ , (1 + £ 5 ^ 


1 I Ui_i] Y 1‘S'I • Pr bi-i] Pi' i^j = 1 I iij-i] 

typical Uj_i’s 


□ 


Armed with these Claims 14.181 and 14.201 and Observations 14.221 and 14.231 we are now ready to 
prove the main lemma of this section. Recall that the soundness of the 2CSP we started with is 
1 — 7/ for a small constant t/. 

Lemma 4.24. If J2i H{Xi\X^i,Y<^.i) > (l - i^) log/c, then 6{S) <1-6, where 6 < ^4^^1267) 
and £e = (?//2 - £ 5 ) (l/|A| 2 ) . 

Proof. Let the mode assignment be the assignment A: [n] —?■ S which assigns to each variable Xi its 
most common typical assignment (i.e. assignment after a typcial prefix), breaking ties arbitrarily. 
In particular, at least 1/|A| of the typical assignments for xi are equal to A{i). Of course, this 
assignment cannot satisfy more than a (1 — 7/)-fraction of the constraints in the original 2CSP; 
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after removing the s^n atypical variables, {ri/2 — £ 5 ) dn constraints out of the dn/2 constraints 
must still be unsatisfied. 

Recall that the number of tests for each constraint over typcial variables, is approxi¬ 
mately the same for every pair of {i,j) — up to a -multiplicative factor (Observation 14.23]) . 

Therefore, the total fraction of tests over unsatisfied constraints, out of all tests, is approximately 
proportional to the fraction of unsatished constraints: 


E 

typcial, unsatisfied ((, j)’s 




(1 — £ 5 )^^ I {typical, unsatisfied ((,j)’s}| 

( 1 -|-£ 5 )^ |{typical ((, j) G V’II 


(*d)eb 


{I-€ 5 )'^ ( 77 / 2 -£ 5 ) dn ^ 

> -^-- 2 ^ 2 : {z,j) 


(1 -I- £ 5 )^ 
(1 -I- £ 5 )^ 


dn/2 




(d-2e5)- 2:(n,u) 


(Observation I4.22h 


(w,r)E(S'x S') 


For each such pair (i,j), on at least a 1/|Ap-fraction of the tests both variables receive the 
mode assignment, so the constraint is violatecil. Thus the total number of violations is at least 

£6 lZ{u,v)&{SxS) 2 ^ (where £& = {r]/2 - £ 5 ) (l/| Ap) 


2i (i-gs) \ 

Finally, we show that so many violations cannot concentrate on less than a d-fraction of the 
pairs u,v G S; otherwise: 

^ ^ 2 

1 


{u,v)e{SxS)\E 


> 


1 


d|5|2 

\S\^el 


^{u,v) 

yiu,v)e{SxS)\E 

£6 ^iu,v) 

{u,v)g{SxS) 

(IE„,^ [I{u,v)]f ; 


(Cauchy-Schwartz) 


yet by Claim 


2] X\u,v)< Y1 2:2(n,u) < (l + 2£7)d^|5|2(E„,4X(u,7;)])^ 

(u,v)&[SxS)\E (u,v)eSxS 

Thus we have a contradiction since d'^(l -|- 2 £ 7 ) < £g/d by our setting of 6 . Therefore we have 
2CSP-violations in more than a 5-fraction of the pairs u,v G S. □ 


With Lemma 14.161 and Lemma 14.241 we can now complete the proof of Theorem 14.11 

Theorem 4.1 (Soundness). If OPT(?/;) < 1 — r/, then VS C V of size k' = k ■ 1^1, 

den(S) <1 — 5 for some constant 5. 

Proof. Recall that Xli + A = log A:' > (1 - j^)logA: by Fact ST) If A > (l^)logfc, 
then by Lemma 14.161 5(S) <1 — 5. Otherwise, if a * > (1 “ Lemma [1211 

5(S) <1-5. □ 

^We remark that a more careful analysis of the expected number of violations would allow one to save an |A|^-factor 
in the value of ee. Since it does not qualitatively affect the result, we opt for the simpler analysis. 
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A PCP theorem 

Theorem 2.11 (PCP Theorem |Din07| L Civen a 3SAT instance ip of size n, there is a polyno¬ 
mial time reduction that produces a 2CSP instance ip, with size {ipl = n ■ polylogn variables and 
constraints, and constant alphabet size such that 

• (Completeness) //OPT(y?) = 1 then OPT(?/)) = 1. 

• (Soundness) If OPT((^) < 1 then OPT('0) < 1 — p, for some constant rj = D(l) 

• (Balance) Every vertex in ip has degree d for some constant d. 

Proof. We start with the following version of PCP of near linear size. 

Theorem A.l t |Din07] . version as in [AIM14] i. Given a 3SAT instance ip of size n, there is a 
polynomial time reduction that produces a 3SAT instance with size |^| = n • polylog n variables 
and constraints such that 

• (Completeness) If OPT {ip) = 1 then OPT(^) = 1. 

• (Soundness) If OPT((^) < 1 then OPT(^) <1 — e, for some constant 0 < e < 1/8 

• (Balance) Every clause of ip involves exactly 3 variables, and every variable of ip appears in 
exactly d clauses, for some constant d. 
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We use the following definition to reduce ^ given by Theorem lA.ll to a 2CSP instance V’- 

Definition A.2 ( [AIM 14] , Clause/Variable game). Given a 3SAT instance ^ with n variables 
xi,...Xn and m clauses Ci,... ,Cm, the clause/variable game is defined as follows: Referee 
chooses an index i G m uniformly at random, then chooses j € [n] uniformly at random conditioned 
on Xj or xy appearing in Ci as a literal. He sends i to Alice and j to Bob. Referee accepts if and 
only if 

• Alice sends back a satisfying assignment to the variables in Ci. 


• Bob sends back a value for xj that agrees with the value sent by Alice. 

In particular, we can think of following explicit reduction. 

1. X = [m] represents clauses; Y = [n] represents variables; A = {0,1}^ represents assignment 
to all 3 variables in a clause; B = {0,1} represents assignment to a singleton variable. 

2. (i,j) G E A Xj or aTj appears in ith clause {Ci). 

3. V(ij) checks for the following : 

• Assignment on z E [m] indeed satisfies the clause Q and, 

• Assignment on z E [m] agrees with the assignment on j G [n]. 


The size blowup is indeed only constant, since we have linear number of vertices, and constant 
alphabet size. Also any vertex in X has degree 3, and any vertex in Y has degree d since we we 
started with Dinar’s PCP. Completeness follows by assigning satisfying assignment for 3SAT to 
this 2CSP. Soundness follows from the following claim: 

Claim A.3 r |AIM14j l. OPT(0 < 1 - e, then OPT(z/) < 1 - e/3 


Proof. Consider fixing an assignment x on V’s. By our assumption on this violates the clause Ci 
with probability at least e over z. And if x violates Ci, regardless of assignments on A, at least one 
out of 3 edges of z E A is not satisfied. Therefore, at least e/3-fraction of the edges are violated, 
thus OPT(z/) < 1 — e/3. □ 


Now we add trivial constraints (i.e. always satisfying edges) between vertices in A to make the 
overall graph of d-regular. (we lose bipartite property, which is not necessary in our reduction) 
Take a regular graph on A with degree d — 3. Add the edges with constraints on them as trivial 
constraints to our 2CSP instance generated via the reduction. Now the graph is indeed d-regular, 
completeness is preserved since we only added trivial constraints. For soundness, we know that 
there are now total 3|A| -|- ^y^|A| edges. Among them ^y^|A| are always satisfied. Out of 3|A| 
edges, at most 1 —e/3 fraction of them are satisfied, i.e. (3 —e)|A| edges. So the fraction of satisfied 
edges is at most : 


OPT(z/>) < 


(3-e)|A| + ^|A 
3|A| + ^|A| 


d 3 — 2s £ 

-;- <1 - : = ^ — V 

d + 3 d 


□ 
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B Useful approximations 


We recall some elementary approximations to logarithms and entropies that will be useful in the 
analysis. 

Fact B.l. (Fact [478|1 If k = then, 

logk = nH ^ zb O (logn) — ~ plogn 

Proof. By Stirling’s approximation, we have 

logn! = nlogn — (loge) n + (9 (logn) 

Therefore the total entropy is given by 


log k = log 


n 


= logn! — logp! — log (n — p)! 

= n log n — p log p — {n — p) log (n — p) zb O (log n) 
'P' 


= nH 


n 


zb O (log n), 


For small e, we have 


and in particular, 


log (1 + e) = (log e) - y + (9 (e^)^ ; 


log 


n — p 


n 


= o - 


n 


Therefore, 


Tt Tl 

log k = p ■ log —b (n — p) • log-b O (log n) 

p n — p 

= P • Q - o (1)^ log n + (n - p) • O (^0 + O (log n) 


= ( 2 “ 


More useful to us will be the following bounds on log k': 

Fact B.2. (Fact 14.9p Let El > 5eo, and k,k',V,n, p as specified in the construction. Then, 


□ 


log k' > max |log k, nH ~ log k/ log n. 


In particular, this means that most indices i should contribute roughly 9/ (^) entropy to the 
choice ofv. 


24 








Proof. Observing that since k = , we have 


n 


log |F| = log { ^ ) + plog 1^1 = (1 + o(l)) log k. 


We also have that 


loglog|F| = log(l + o(l)) +loglogA: > \ogp > -logn; 

where the first inequality follows from Fact 14.81 and the second from the definition of p. 
Finally, we have 


(16) 


(17) 


log/c' = log A: — So log |F|/loglog |F| 

> log A; — eo(l + o(l)) log k /^ logn 

> log k — -ei log k/ log n 
Using Fact 14.81 completes the proof. 


(Using (fTUD and (fTTl) ') 
(Using £i > 5eo) 


□ 


We will also need the following bound which relates the entropies of a very biased coin and a 
slightly less biased one: 


Fact B.3. (Fact I4.10h 

'1 + v 


H 


n 


Proof. By definition 

fj.l + V 


^ ( 1 ) - - log 1 - (loge) ^ + 0{n ^) + O (— 

^ n J n n In \ n 


l + ”b„g(i±^V(l-—)log(l-l^ 


n J \ n J \^/ \ ^ 

In order to relate this quantity to H (i), we rewrite as: 


H 


1 + V 


n 


1 1 V 1 ( l + v 

= -log-log-- 

n n n n \ n 


log (1 + v) 

(loge)(u’— 


- ( 1 - - ) log ( 1 - - ) + n - log (1 - - ) - (1 

n J \ n J n \ n J \ 


l + v 


n 


log 


' i-m 


0{n~+ 

= 77 log - - (log e) ^ + O (n"2) + O f— 

' n J n n In ' \ n 


(log e){-{v/n)-0{v/+)) 


□ 


C Small constants in the proof of Theorem 14.11 

To help verify the correctness of the proof, we concentrate all the definitions of the small e’s used 
in the following list: 
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8o < ei /5 

£i > 4^2 log |vl| + 863 


4 


82- 82 < 0 . 2 , 5 — 


• £3 > £4 log 1^1 - £4 log £4 - (1 - £4) log(l - £4) 

• £4 = u}{n/p‘^) 

• £ 5 : (!T) 4 > Wei 

• £ 6 : £6 ^ (^?/2 - £ 5 ) ( l /|£ lP ) ^*4 + 2 £ 7 ) < 4 ,/^ 

• £7: £7 > 6£5 + ©(£5) 
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